Shallow Language Processing Architecture for Bulgarian
نویسندگان
چکیده
This paper describes LINGUA an architecture for text processing in Bulgarian. First, the pre-processing modules for tokenisation, sentence splitting, paragraph segmentation, partof-speech tagging, clause chunking and noun phrase extraction are outlined. Next, the paper proceeds to describe in more detail the anaphora resolution module. Evaluation results are reported for each processing task.
منابع مشابه
A Hybrid Approach for Deep Machine Translation
This paper presents a Hybrid Approach to Deep Machine Translation in the language direction from English to Bulgarian. The set-up uses preand post-processing modules as well as two-level transfer. The language resources that have been incorporated are: WordNets for both languages; a valency lexicon for Bulgarian; aligned parallel corpora. The architecture comprises a predominantly statistical c...
متن کاملVerb Valency Descriptors for a Syntactic Treebank
An essential component of Language Engineering (LE) tools are verb class descriptors that provide information about the relations of the predicates to their arguments. The production of computationally tractable language resources necessitates the assignment of types of predicate-argument relations to a great variety of verb-centered structures: it is necessary to define not only the initial, c...
متن کاملAn XML Architecture for Shallow and Deep Processing
The paper presents a set of XML tools for natural language processing such as regular grammars, constraints, transformations, remove and insert operations. The architecture allows any combinations of the tools depending on the task and the concrete analysis. The main control mechanism is the backtracking which depends on achieving a particular subgoal in the analysis. The main advantage of the ...
متن کاملIntegrating deep and shallow natural language processing components: representations and hybrid architectures
We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processin...
متن کاملMultilingual summarization system based on analyzing the discourse structure at MultiLing 2013
This paper describes the architecture of UAIC 1 ’s Summarization system participating at MultiLing – 2013. The architecture includes language independent text processing modules, but also modules that are adapted for one language or another. In our experiments, the languages under consideration are Bulgarian, German, Greek, English, and Romanian. Our method exploits the cohesion and coherence p...
متن کامل